# NeuroRenderedFake Database
## NeuroRenderedFake: A Challenging Benchmark to Detect Fake Images Generated by Advanced Neural Rendering Methods
A Large Dataset of Neural-rendered Fake Images: Using NeRF- and 3DGS-based neural rendering techniques, we generate a variety of realistic 3D scene representations and render 2D fake images from various angles of view. Additionally, we synthesize fake images containing combined artifacts by integrating generative models with neural rendering technologies. In contrast to existing databases that focus exclusively on fake images from generative models like GAN and DM, our diverse collection expands the scope of fake image detection, being the first in its inclusion of images derived from neural rendering-based synthesized/edited 3D scenes. 

## Description of the Data

Our dataset is organized:

```Give examples
Root Dir/
//////////////////// Test Group 1-11
  -GAN+DM/
    -artifact
    -coco_eval
    -...
  -gaussiantalker/
    -raw
    -synthesis
  -genefacepp/
    -raw
    -synthesis
  -gsgen/
    -rendered_imgs
  -Instruct-N2N_Instruct-GS2GS/
    -inria_gs-igs2gs
    -inria_gs-in2n
    -...
  -nerfstudio_proj/
   -blender
   -dnerf
   -...
  -Pix2Nerf/
    -rendered_faces
    -rendered2
  -SketchFaceNerf/
    -rendered_faces
  -SplattingAvatar/
    -real_images
    -rendered_images
  -stableDreamfusion/
    -rendered_results
//////////////////// Test 12-15 Groups
  -videocrafter1/
    -0_real
    -1_fake
  -zeroscope/
    -0_real
    -1_fake
  -opensora/
    -0_real
    -1_fake
  -runway_gen4_turbo/
    -fake
    -real
  -sora_frames/
    -sora_tiktok
    -sora_websit
    -washed
  -liveportrait/
    -raw
    -synthesis
///////////////////// GenImage
  -GenImage/
    -ADM
    -BigGAN
  -README.md
```

### And file formats

The image files format is (.png, .jpg, .jpeg, ...), and these dataset contains the training and testing dataset with totally:
```
-3,685,673 images.
```

## About Data Collection Methodology

We describe the details of our collected dataset in Tab. 2 and Tab. 3 in the main paper. We present a comparative summary of our database with the related or popular fake detection database in Tab. 1. Notably, the generation of 3D scenes and their projection to a 2D view plane can be significantly costly and need to be built from scratch. In this context, the generation cost evaluated by GPU hours for our database is about four times higher than for the GenImage [<a href="#genimage">1</a>].

### Protocols for the Training Dataset
  - A: For the real images, we randomly sample 20,000 images from each folder of {afhq, celebahq, lsun}of ArtiFact [<a href="#artiface">2</a>] database, respectively, and collect all the 4,318 images from landscape folder and all the 1,336 images from metfaces folder. Therefore we acquire a total of 65,654 real images. For GAN-generated fake images, we collect 10k, 10k, 7k, 10k, 15k, and 15k images from the folders of BigGAN, Gans-former, GauGAN, ProjectedGAN, StyleGAN3, and Taming Transformer, respectively.
  - B: The real images are the same as Awhile a total of 66,896 fake images generated by 6 DMs are selected. Exactly, we collect 10k, 896, 20k, 6k, 20k, and 10k images from the folders of Glide, DDPM, Latent Diffusion, Palette, Stable Diffusion, and VQ Diffusion, respectively.
  - C: For the real class, we use all the 69,377 real images in A∼J. For the rendered class, we sample 12,000 images in A∼J for method I, II, III, IV, V, respectively. Therefore, we acquire a total of 60,000 rendered images.
  - D: For the real class, we use all the 69,377 real images in A∼J. For the rendered class, we use all the 40,734 splatfacto-rendered images in A∼J and all the 10,785 C3dgs-rendered images in G. Therefore, we acquire a total of 51,519 rendered images.

### Peformance Evaluation Test Group
  - Group 1 (I∼V): For the real class, we sample all the real images from K, L, M, N, O. For the fake class, we sample all the images from K, L, M, N, O rendered by the method I, II, III, IV, V, respectively.
  - Group 2 (V∼VIII): For the real class, we sample all the real images from K, L, M, N, O. For the fake class, we sample all the images from K, L, M, N, O rendered by the method VI, VII, VIII, respectively.
  - Pix2NeRF [<a href="#pix2nerf">2</a>]: For real class, We use all the 70,000 images in ffhq folder of ArtiFact [<a href="#artiface">2</a>] database. For fake class, we render 96,000 (1,000 ×96) images, where we reconstruct 1000 identities and render 96 images from different views for each identity.
  - SketchFaceNeRF [<a href="#sketchbased">4</a>]: For real class, We use all the 70,000 images in ffhq folder of ArtiFact [<a href="#artiface">2</a>] database. For fake class, we render 90,000 (60 ×60 ×25) images, where we use 60 sketches to style-transfer 60 identities and render 25 images from different views for each style-transferred head.
  - DreamFusion [<a href="#dreamfusion">5</a>]: For real class, we randomly sample 10,000 images in imagenet folder of ArtiFact [<a href="#artiface">2</a>] database. For the fake class, we render 10,600 (106 ×100) images, where we use 106 prompts for generation and render 100 images from different views for each generated 3D scene.
  - GSGEN [<a href="#gsgen">6</a>]: For real class, we randomly sample 10,000 images in imagenet folder of ArtiFact [<a href="#artiface">2</a>] database. For the fake class, we render 9,540 (106 ×90) images, where we use 106 prompts for generation and render 90 images from different views for each generated 3D scene.
  - Instruct-N2N [<a href="#instruct-nerf2nerf">7</a>]: For real class, we acquire all the 3,174 real images which are used to successfully train the nerfacto (III) of dataset K, L, M, N, O. For the fake class, we generate 40,559 edited images from the nerfacto-generated 3D scenes.
  - Instruct-GS2GS [<a href="#instruct-gs2gs">8</a>]: For real class, we acquire all the 3,174 real images which are used to successfully train the splatfacto (VII) of dataset K, L, M, N, O. For the fake class, we generate 40,559 edited images from the splatfacto-generated 3D scenes.
  - SplattingAvatar [<a href="#splattingavatar">9</a>]: For the real class, we use all the 33,728 images of 14 identities (10 identities are head and 4 identities are full body) provided by [<a href="#splattingavatar">9</a>]. For the fake class, we generate 33,715 rendered images for 14 identities.

  - GeneFace++ [<a href="#genface++">10</a>]: We have 29 videos of different identities (The source videos are download from the YouTube, such as [subject1](https://www.youtube.com/watch?v=OBG50aoUwlI&t=335s), [subject3](https://www.youtube.com/watch?v=hPdpemrGmg8&t=93s), [subject2](https://www.youtube.com/watch?v=rLQ1V_H7vh4&t=92s), [subject4](https://www.youtube.com/watch?v=WPtkQ-Q7hfc&t=1176s), [subject5](https://www.youtube.com/watch?v=2vEzFQNJazE), [subject6](https://www.youtube.com/watch?v=rnfWgrUoZiI&t=87s), [subject7](https://www.youtube.com/watch?v=oEd1FBW3H70&t=118s), ...) for training the 3D representation of speaking head. The original speaking voice of different identities includes various languages, and such a multi-lingual property leads to the rich diversity of the dataset. We use 2 different methods to generate the fake speech video: 1). 14 identities to speak the contents from the other 13 identities by inputting the extracted audio, and therefore generate 182 (14 ×13) fake videos. 2) 15 identities to speak a predefined context by using 17 multi-langual, and therefore generate 255 (15x17) fake videos. For the real class, We sample all frames from each real video and generate 88,737 images. For the fake class, we sample all frames from each fake video and generate 452,653 images.
  - Gaussiantalker [<a href="#gaussiantalker">11</a>]: We extend the real video dataset with one more identity and employ the same videos and same rule 1) and 2) to generate fake speech videos as those used to create GenFace++ fake videos. But only replace the Genface++ method to the Gaussiantalker that uses the 3D Gaussian Splatting. For the real class, we extract all frames from each real video and generate 94,810 images. Conversely, for the fake class, we extract all frames from each fake video and generate 411,401 images.
  - LivePortrait [<a href="#liveportrait">12</a>]: We also random select 24 identities from the download 29 different identities, and use the liveprotrait method to generate fake video by driving same examples tat offered by the liveportrait. For the real class, we extract all frames from each real video and generate 59,260 images. Conversely, for the fake class, we extract all frames from each fake video and generate 88,466 images.
  - SORA [<a href="#sora">13</a>] frames: For real class, we randomly sample 60,000 images in coco folder of ArtiFact [<a href="#artiface">2</a>] database. For the fake class, we collected 94 publicly released videos generated by SORA and randomly cropped a total of 60,531 images in the size of 512 ×512 pixels from the frames of these SORA-generated videos.
  - Runway [<a href="#runway">14</a>]: We randomly select 156 real videos from InternVid-10m [<a href="#internvid-10m">16</a>], and 44 real videos from Youtube-8m [<a href="#youtube8m">15</a>] dataset. For generating the fake video by the Runway Gen4-Turbo model, we randomly select one image from by randomly select other 161 videos from IngerVid-10m and 35 videos from Yotube-8m. We follow two method to generate the fake video: 1) For the image from the Youtube-8m, we use the image-to-video method, 2) For the image from the InternVid-10m, we use the text&image-to-video to generate the fake video, and the text it the caption of the video from the InternVid-10m. Because the generated fake video by Runway is 24 fps per second, and the total length is 5 seconds with about 121 frames. For the real class, we extract 121 frames from each real video and generate 23,334 images. Conversely, for the fake class, we extract all frames from each fake video and generate 23,353 images.


## Reference

[1] <a name="genimage"></a> M. Zhu, H. Chen, Q. Yan, X. Huang, G. Lin, W. Li, Z. Tu, H. Hu, J. Hu, and Y. Wang, “Genimage: A million-scale benchmark for detecting ai-generated image,” Advances in Neural Information Processing Systems, vol. 36, 2024.

[2] <a name="artiface"></a> Md Awsafur Rahman, Bishmoy Paul, Najibul Haque Sarker, Zaber Ibn Abdul Hakim, and Shaikh Anowarul Fattah. Artifact: A large-scale dataset with artificial and factual images for generalizable and robust synthetic image detection. In 2023 IEEE International Conference on Image Processing (ICIP), pages 2200–2204. IEEE, 2023. 3, 5, 6

[3] <a name="pix2nerf"></a> Shengqu Cai, Anton Obukhov, Dengxin Dai, and Luc Van Gool. Pix2nerf: Unsupervised conditional p-gan for single image to neural radiance fields translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3981–3990, 2022.

[4] <a name="sketchbased"></a> Gao Lin, Liu Feng-Lin, Chen Shu-Yu, Jiang Kaiwen, Li Chunpeng, Yukun Lai, and Fu Hongbo. Sketchfacenerf: Sketch-based facial generation and editing in neural radiance fields. ACM Transactions on Graphics, 2023

[5] <a name="dreamfusion"></a> Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Mildenhall. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.

[6] <a name="gsgen"></a> Zilong Chen, Feng Wang, and Huaping Liu. Text-to-3d using gaussian splatting. arXiv preprint arXiv:2309.16585, 2023.

[7] <a name="instruct-nerf2nerf"></a> Ayaan Haque, Matthew Tancik, Alexei A Efros, Aleksander Holynski, and Angjoo Kanazawa. Instruct-nerf2nerf: Editing 3d scenes with instructions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 19740–19750, 2023.

[8] <a name="instruct-gs2gs"></a> Jing Wu, Jia-Wang Bian, Xinghui Li, Guangrun Wang, Ian Reid, Philip Torr, and Victor Adrian Prisacariu. Gaussctrl: Multi-view consistent text-driven 3d gaussian splatting editing. arXiv preprint arXiv:2403.08733, 2024.

[9] <a name="splattingavatar"></a> Zhijing Shao, Zhaolong Wang, Zhuang Li, Duotun Wang, Xiangru Lin, Yu Zhang, Mingming Fan, and Zeyu Wang. Splattingavatar: Realistic real-time human avatars with mesh-embedded gaussian splatting. arXiv preprint arXiv:2403.05087, 2024.

[10] <a name="genface++"></a> Zhenhui Ye, Jinzheng He, Ziyue Jiang, Rongjie Huang, Jiawei Huang, Jinglin Liu, Yi Ren, Xiang Yin, Zejun Ma, and Zhou Zhao. Geneface++: Generalized and stable real-time audio-driven 3d talking face generation. arXiv preprint arXiv:2305.00787, 2023.

[11] <a name="gaussiantalker"></a> S. Kim, “Gaussiantalker: Real-time high-fidelity talking head synthesis with audio-driven 3d gaussian splatting,” in 32th ACM Multimedia conference, MM 2024. Association for Computing Machinery, Inc, 2024.

[12] <a name="liveportrait"></a> J. Guo, D. Zhang, X. Liu, Z. Zhong, Y. Zhang, P. Wan, and D. Zhang, “Liveportrait: Efficient portrait animation with stitching and retargeting control,” arXiv preprint arXiv:2407.03168, 2024.

[13] <a name="sora"></a> OpenAI. Creating video from text, 2024. https://openai.com/index/sora/. 

[14] <a name="runway"></a> Runway Gen-4, 2025, https://runwayml.com/research/introducing-runway-gen-4.

[15] <a name="youtube8m"></a> Abu-El-Haija, Sami, et al. "Youtube-8m: A large-scale video classification benchmark." arXiv preprint arXiv:1609.08675 (2016).

[16] <a name="internvid-10m"></a> Wang, Yi, et al. "Internvid: A large-scale video-text dataset for multimodal understanding and generation." arXiv preprint arXiv:2307.06942 (2023).

